In [130]:
%matplotlib nbagg
In [2]:
from planet4 import markings
In [3]:
id_ = 'bvc'
p4id = markings.ImageID(id_, scope='planet4')
In [4]:
d = dict(x=[280, 300, 320, 340], y=4*[300],
angle=[85, 265, 175, -5],
radius_1=[200, 200, 30, 30],
radius_2=[30, 30, 200, 200])
In [5]:
df = pd.DataFrame(d)
In [6]:
p4id.plot_blotches(blotches=df, lw=2)
What happened a while ago is that we realized one symmetry, but not the other. I then implemented an angle normalization like this:
In [7]:
df = df.assign(angle_new=df.angle%180)
df
Out[7]:
This created the issue that the angles are now distributed with 180 degree difference, while the radii still were not sorted (some have radius_1 > radius_2, some the opposite).
For the 90 degree symmetry, we realized that we can normalize by sorting by radius and adding 90 degree, in case a swap was required:
In [8]:
def normalize_radii(df, angle_col='angle'):
data = df.copy()
idx = data.radius_1 < data.radius_2
col_orig = ['radius_1','radius_2']
col_reversed = list(reversed(col_orig))
data.loc[idx, col_orig] = data.loc[idx, col_reversed].values
data.loc[idx, angle_col] -= 90
return data
In [11]:
normed = normalize_radii(df, 'angle')
normed
Out[11]:
Applying the 180 degree normalization at this point should provide us with normalized ellipses, that can be clustered together:
In [13]:
normed.angle = normed.angle % 180
In [14]:
normed
Out[14]:
To test this repair more thoroughly, I create all 4 identical ellipses for 3 slightly offset-drawn ellipses, each for a blotch with semi-major axis around 0, around 45 degree and around 90, where I presume mathematical issues could occur.
Hence I will create 4 blotches for a -2, 4 for a 0 and 4 for a +2 alignment, and after correction, they should all show the same angle. Same for the blotches at 45 and 90 degrees angle.
In [15]:
x=[280, 300, 320, 340]
y=4*[300]
angle=[85, 265, 175, -5]
radius_1=[200, 200, 30, 30]
radius_2=[30, 30, 200, 200]
In [16]:
angle_groups = [[-2,0,2], [43,45,47],[88,90, 92]]
y_loc = [500, 300, 100]
x_loc = 300
In [204]:
x = []
y = []
angle = []
rads1 = []
rads2 = []
for g, y_avg in zip(angle_groups, y_loc):
for angle_val in g:
for offset,r1, r2 in zip([0, 180, 90, -90,], radius_1, radius_2):
x.append(300+np.random.randint(-5,5))
y.append(y_avg+np.random.randint(-5,5))
angle.append(angle_val+offset)
rads1.append(r1)
rads2.append(r2)
In [205]:
df = pd.DataFrame(dict(x=x, y=y, angle=angle, radius_1=rads1,
radius_2=rads2))
In [206]:
df
Out[206]:
In [207]:
p4id.plot_blotches(blotches=df, lw=2)
In [208]:
normed = normalize_radii(df)
Plotting normalized angles, one can see that all ellipses are still the same.
In [209]:
p4id.plot_blotches(blotches=normed, lw=2)
In [210]:
df.angle.values
Out[210]:
In [138]:
normed.angle.values
Out[138]:
Doing a module operation on the radius-normalized angles solves almost everything, apart from the case around 0, where I would like to have 178 to be -2 (which of course can't be done using a module):
In [111]:
(normed.angle % 180).values
Out[111]:
After long deliberation and wasting Anya's afternoon as well, I came to the conclusion that clustering on the sine value of the angle is actually okay. (And I already realised that after/during Taiwan meeting, but did not have the full radius-based angle normalization in place then!)
In [235]:
np.sin(np.deg2rad(normed.angle%180)).values
Out[235]:
Below, the three sets of blotches are clustering at 3 different locations in sine-space.
In [237]:
input_angles = (normed.angle%180)
y = np.sin(np.deg2rad(input_angles))
plt.figure();
plt.scatter(y, np.zeros_like(x));
Afterwards, having the indexes of what clustered together, I will be using the ORIGINAL angles of the clustered blotches for the average creation.
Fortunately, the circmean function can be told, what the wrap-around value is, so that will be 180 for blotch angles, and 360 for fans.
In [238]:
from scipy.stats import circmean
In [246]:
# this averages the first 12 objects that jitter around 0,
# so the average should be 0.
circmean(normed.angle[0:12]%180, high=180).round()
Out[246]: